Scalable cloud architecture gears MYND up for the commercial market
Kapernikov offers a unique combination of expertise: they know AI and data science, and they can hel ...
Data-intensive applications
When you’re dealing with the energy market, you’re dealing with big data.
In Belgium, customers can easily switch between energy suppliers—and they often do. The companies then have to speak to clearing house Atrias for updated info, drawing from a pure jungle of data.
Enter Kapernikov, to bring order into chaos.
The Belgian energy market was liberalised in 2007. Since then, gas and electricity suppliers have sprung up like mushrooms and they fiercely compete for every customer. People switch back and forth easily, especially now that sustainability has become an additional motivator. Meanwhile, distribution grid operators also see the climate goals ahead and they are working hard to build a greener and smarter power grid. To tackle the data challenges, they jointly founded the clearing house Atrias.
Atrias runs a Central Market System (CMS) to simplify data exchange between all the parties: the grid operators as well as the energy transport companies and the actual suppliers. That’s a lot of clients to cater to, and they all need their own tailored info. On top of the CMS, Atrias manages an Operational Data Store (ODS), which turns out daily aggregates. For the much-needed data quality control of this ODS, Atrias called in Kapernikov.
To fully understand the challenge, try not to be dazzled by all the factors involved.
Kapernikov built a framework that guarantees the quality of the ODS by comparing its actual output with the required one. If that’s a match, the application can be trusted to deliver quality data. Usually, such a comparison would involve lots of manual checks, which is an awful lot of work and only yields superficial results. Remember, we are dealing with billions of records. Automatisation is key. But how? Sit tight, because this is where it gets technical.
Our developers first screened the data from both the CMS and ODS databases, according to Atrias’ logical data model (LDM). The LDM describes how the business is run: the required data (for example, every contract has to be linked to a person or company), the relations between data, etc. The logical model applied to both databases, but there was a big difference between the technical implementation of the two. Which data went into which table in which format? That was just one of many questions.
To solve the riddle, Kapernikov brought its own new data store into play. Basically, it converts data from both databases to a canonical data model, so they speak the same language. Rather than writing endless amounts of individual SQL queries implementing the same logic, it defines functionalities once and sets up dynamic code generation from there. Easy does it. Any correction only has to be done in one place, rather than in hundreds or thousands. Besides, this approach runs a much lower risk of flaws in the code.
Developers took a shortcut by organising the work in DAGs (Directed Acyclic Graphs – tasks that run in a specific order, building on previous tasks). They applied modern data pipeline practices and used Luigi as the dataflow orchestration tool. To make the flow smoother, the data warehouse itself takes care of all data transformations, expressed in SQL queries. Long story short: it was all about parallelisation of tasks, huge data throughput and logical transparency.
Our approach allowed us to verify billions of records and spot even the tiniest error or inconsistency. From there, all that was required was analysing the root causes and weeding them out with Atrias’ team. The quality of the data in the ODS improved, and even long-lived glitches in the CMS could now be fixed. Both Atrias and the grid operators can rest assured about their reporting system. In the end, it all looks simple.
Python
Dynamically generated SQL
SQL Server